摘要 :
Image-to-image translation (I2I) has broad application prospects for assisting physicians in diagnosis of medical image missing scenarios. Considering that there is no medical I2I model constructed from a geometric view of simulta...
展开
Image-to-image translation (I2I) has broad application prospects for assisting physicians in diagnosis of medical image missing scenarios. Considering that there is no medical I2I model constructed from a geometric view of simultaneously preserving local manifold-value and global manifold structure, we propose an I2I model based on manifold-value correction and manifold matching (MMNet) to translate one modal image to another in a paired and unpaired fashion and preserve the texture details of the target model image. For local manifold-value preservation, each manifold-value of the generated image is aligned with the corresponding real image as much as possible by jointly optimizing the distribution corrector and the distribution generator. For global manifold structure preservation, three distance metrics are defined to globally reduce the difference between the manifold of the generated images and the manifold of the real images through optimizing the manifold matching loss. Experimental results demonstrate that the proposed MMNet outperforms multiple state-of-the-art GANs-based methods for MR image translation in both qualitative and quantitative measures.
收起
摘要 :
In this paper, we propose a text matching method for document image retrieval without any language model. Two word images are first normalized to an appropriate size and image features are extracted using the local crowdedness met...
展开
In this paper, we propose a text matching method for document image retrieval without any language model. Two word images are first normalized to an appropriate size and image features are extracted using the local crowdedness method. Similarity between t
收起
摘要 :
Similar image/shape retrieval has attractedincreasing interests in recent years. A typical strategy of existing retrieval algorithms is to rank the images according to the image-to-image similarities, e.g., the similarities betwee...
展开
Similar image/shape retrieval has attractedincreasing interests in recent years. A typical strategy of existing retrieval algorithms is to rank the images according to the image-to-image similarities, e.g., the similarities between the query image and the images in the database. This strategy ignores the inherent information of the class that the query image belongs to (we call it query class). To address this issue, rather than using image-to-image similarity, we propose a simple yet effective retrieval method based on exploring the image-to-class similarity. The method uses an iterative framework, where the size of the query class is progressively enlarged according to the previous retrieval results, and the ranked list is generated according to the similarities between the images in the database and the query class. This framework enables us to explore the inherent information of the query class, and hence helps to improve the retrieval accuracy. Experimental results on various datasets demonstrate that our method is able to effectively improve the image and shape retrieval accuracy compared to state-of-the-art methods. (C) 2016 Elsevier B.V. All rights reserved.
收起
摘要 :
To produce a complete 3D reconstruction of a large-scale architectural scene, both ground and aerial images are usually captured. A common approach is to first reconstruct the models from different image sources separately, and al...
展开
To produce a complete 3D reconstruction of a large-scale architectural scene, both ground and aerial images are usually captured. A common approach is to first reconstruct the models from different image sources separately, and align thetn afterwards. Using this pipeline, this work proposes an accurate and efficient approach for ground-to-aerial model alignment in a coarse-to-fine manner. First, both the ground model and aerial model are transformed into the geo-referenced coordinate system using GPS meta-information for coarse alignment. Then, the coarsely aligned models are refined by a similarity transformation that is estimated based on 3D point correspondences between them, and the 3D point correspondences are determined in a 2D-image-matching manner by considering the rich textural and contextual information in the 2D images. Due to the dramatic differences in viewpoint and scale between ground and aerial images, which make matching them directly nearly impossible, we perform an intermediate view-synthesis step to mitigate the matching difficulty. To this end, the following three key issues are addressed: (a) selecting a suitable subset of aerial images to cover the ground model properly; (b) synthesizing images from the ground model under the viewpoints of the selected aerial images; and finally, (c) obtaining the 2D point matches between the synthesized images and the selected aerial images. The experimental results show that the proposed model alignment approach is quite effective and outperforms several state-of-the-art techniques in terms of both accuracy and efficiency. (C) 2017 Elsevier Ltd. All rights reserved.
收起
摘要 :
BackgroundExploring the correspondences across multi-view images is the basis of many computer vision tasks. However, most existing methods are limited on accuracy under challenging conditions. In order to learn more robust and ac...
展开
BackgroundExploring the correspondences across multi-view images is the basis of many computer vision tasks. However, most existing methods are limited on accuracy under challenging conditions. In order to learn more robust and accurate correspondences, we propose the DSD-MatchingNet for local feature matching in this paper. First, we develop a deformable feature extraction module to obtain multi-level feature maps, which harvests contextual information from dynamic receptive fields. The dynamic receptive fields provided by deformable convolution network ensures our method to obtain dense and robust correspondences. Second, we utilize the sparse-to-dense matching with the symmetry of correspondence to implement accurate pixel-level matching, which enables our method to produce more accurate correspondences. Experiments have shown that our proposed DSD-MatchingNet achieves a better performance on image matching benchmark, as well as on visual localization benchmark. Specifically, our method achieves 91.3% mean matching accuracy on HPatches dataset and 99.3% visual localization recalls on Aachen Day-Night dataset.
收起
摘要 :
It is difficult to study body image in animals. In this study, it is assumed that the perception of the body of others reflects body image. The perception of the human face was examined in a series of six experiments with a chimpa...
展开
It is difficult to study body image in animals. In this study, it is assumed that the perception of the body of others reflects body image. The perception of the human face was examined in a series of six experiments with a chimpanzee. Delayed-matching-to-sample tasks were employed. Although the chimpanzee mastered the tasks and showed transfer of performance to new faces, subtle changes in the matching face resulted in the deterioration of performance. Responses of the chimpanzee were often controlled by factors other than the facial stimuli. Thus, although the chimpanzee has a body image as humans do, it may not be as clear and as segmented.
收起
摘要 :
In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-str...
展开
In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-stream with the three input images and reflects the additional augmented pair in the training. As a result, the training process of the deep network is regularized and the network becomes robust for the variance of aerial images. Furthermore, we introduce an ensemble method that is based on the bidirectional network, which is motivated by the isomorphic nature of the geometric transformation. We obtain two global transformation parameters without any additional network or parameters, which alleviate asymmetric matching results and enable significant improvement in performance by fusing two outcomes. For the experiment, we adopt aerial images from Google Earth and the International Society for Photogrammetry and Remote Sensing (ISPRS). To quantitatively assess our result, we apply the probability of correct keypoints (PCK) metric, which measures the degree of matching. The qualitative and quantitative results show the sizable gap of performance compared to the conventional methods for matching the aerial images. All code and our trained model, as well as the dataset are available online.
收起
摘要 :
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different ...
展开
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.
收起
摘要 :
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal o...
展开
Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual-linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual-linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image.
收起
摘要 :
Feature matching is a core step in multi-source remote sensing image registration approaches based on feature. However, for existing methods, whether traditional classical SIFT algorithm or deep learning-based methods, they essent...
展开
Feature matching is a core step in multi-source remote sensing image registration approaches based on feature. However, for existing methods, whether traditional classical SIFT algorithm or deep learning-based methods, they essentially rely on generating descriptors from local regions of feature points, which can lead to low matching success rates due to various challenges, including gray-scale changes, content changes, local similarity, and occlusions between images. Inspired by the human approach of finding rough corresponding regions globally and then carefully comparing local regions, and the excellent global attention property of transformers, the proposed feature matching network adopts a coarse-to-fine matching strategy that utilizes both global and local information between images to predict corresponding feature points. Importantly, the network has great flexibility of matching corresponding points for any feature points and can be effectively trained without strong supervised signals of corresponding feature points and only require the true geometric transformation between images. The qualitative experiment illustrate the effectiveness of the proposed network by matching feature points extracted by SIFT or sampled uniformly. In the quantitative experiments, we used feature points extracted by SIFT, SuperPoint, and LoFTR as the keypoints to be matched. We then calculated the mean match success ratio (MSR) and mean reprojection error (MRE) of each method at different thresholds in the test dataset. Additionally, boxplot graphs were plotted to visualize the distributions. By comparing the MSR and MRE values as well as their distributions with other methods, we can conclude that the proposed method consistently outperforms the comparison methods in terms of MSR at different thresholds. Moreover, the MSR of the proposed method remains within a reasonable range compared to the MRE of other methods.
收起